Synthetic Minority Over-Sampling for Improving Imbalanced Data in Educational Web Usage Mining
نویسندگان
چکیده
منابع مشابه
Classification of Imbalanced Data Using Synthetic Over-Sampling Techniques
of the Thesis Classification of Imbalanced Data Using Synthetic Over-Sampling Techniques
متن کاملBlending Propensity Score Matching and Synthetic Minority Over-sampling Technique for Imbalanced Classification
Real world data sets often contain disproportionate sample sizes of observed groups making the task of prediction algorithms very difficult. One of the many ways to combat inherit bias from class imbalance data is to perform re-sampling. In this paper we discuss two popular re-sampling approaches proposed in literature, Synthetic Minority Over-sampling Technique (SMOTE) and Propensity Score Mat...
متن کاملSafe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem
The class imbalanced problem occurs in various disciplines when one of target classes has a tiny number of instances comparing to other classes. A typical classifier normally ignores or neglects to detect a minority class due to the small number of class instances. SMOTE is one of over-sampling techniques that remedies this situation. It generates minority instances within the overlapping regio...
متن کاملBorderline over-sampling for imbalanced data classification
Traditional classification algorithms, in many times, perform poorly on imbalanced data sets in which some classes are heavily outnumbered by the remaining classes. For this kind of data, minority class instances, which are usually much more of interest, are often misclassified. The paper proposes a method to deal with them by changing class distribution through oversampling at the borderline b...
متن کاملSMOTE: Synthetic Minority Over-sampling Technique
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of “normal” examples with only a small percentage of “abnormal” or “interesting” examples. It is also the case that the cost of misclassifying an abnormal (i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ECTI Transactions on Computer and Information Technology (ECTI-CIT)
سال: 2019
ISSN: 2286-9131,2286-9131
DOI: 10.37936/ecti-cit.2018122.133280